Search and storage of media fingerprints

ABSTRACT

Recognizing that a variety of different fingerprints may correspond to the same dataset, the search of a database of fingerprints to find a match to a target fingerprint is performed with relaxed criteria for declaring a match between two fingerprints. By matching “similar”, but not “exact”, fingerprints, redundant fingerprints need not be stored for each dataset. When a new fingerprint is found, a first-in first-out (FIFO) strategy is used to allocate space in a limited memory-space to store the new entry.

This invention relates to the field of consumer electronics, and inparticular to a method and system that facilitates an efficient searchand storage of digital fingerprints.

U.S. patent application US 2002/0032864 A1, “CONTENT IDENTIFIERSTRIGGERING CORRESPONDING RESPONSES”, filed 14 May 2001 for Geoffrey B.Rhoads and Kenneth L. Levy, presents a variety of techniques that arecommonly used to create one or more “fingerprints” based on the contentsof a dataset, such as an audio or video file, and is incorporated byreference herein. The fingerprint of a dataset is commonly used toaccess ancillary information related to the dataset, such as anidentification of the title of the dataset, the performing artist, thecomposer, the director, and so on. Additionally, the fingerprint of thedataset may be used to verify access rights to the dataset and/or toassess fees associated with such access. Other uses of an identifier ofa dataset based on the contents of the dataset are common in the art.

Commonly used fingerprints associated with entertainment material, suchas audio and video recording are intended to uniquely identify therecording, and as such, are of substantial length. For example, a128-byte format for the fingerprint of professional/commercial audiorecordings is common. A database of hundreds of thousands of suchfingerprints can be expected to be used for uniquely identifyingcommercial audio recordings, and efficient searching techniques forlarge identifiers in large databases are required.

Memory for saving databases of fingerprints and corresponding ancillaryinformation can also be expected to be included in consumerentertainment equipment, and efficient storing techniques for thisinformation will also be required.

Further complicating the task of fingerprint searching and storage, aone-to-one correspondence between a fingerprint and a dataset may notexist. A fingerprint may be based on the entire contents of the dataset,or based on one or more select segments of the dataset. Because thefingerprint is based on the contents of the dataset, the sampling of thedataset to obtain a fingerprint may produce different fingerprints forthe same dataset. A search of a database of fingerprints to find a matchwith a currently determined fingerprint often requires multiple searchesthrough the database, based on alternative samples of the dataset,and/or a search through a database that contains multiple fingerprintsfor the same dataset.

Consider, for example, a database of songs, and a fingerprint creationscheme that provides an average of ten different fingerprints for thesame song. The database can be constructed to contain the ten mostfrequently occurring fingerprints for each song, or it could beconstructed to contain the single most likely fingerprint. When anas-yet-unknown dataset is sampled to produce a “search” fingerprint, itmay or may not match a fingerprint in the database, either because thisparticular song is not included in the database, or because the song isin the database but the particular search fingerprint is not one of thefingerprints in the database for this song. When a match is not found, anew sample is typically obtained, and if a new search fingerprint isproduced, this new fingerprint is used to search the database for amatch. Having the ten most frequently occurring fingerprints for a songstored in the database increases the likelihood of a match being foundquickly, but it also requires comparing the search fingerprint toten-times as many stored fingerprints; storing only one fingerprint persong reduces the size of the database and the search-time for eachsearch fingerprint, but increases the likelihood of having to performmultiple searches using different acquired fingerprints.

Because of the likelihood of multiple fingerprints corresponding to thesame song, the need for efficient search and storage techniques existseven for relatively small databases, and is particularly crucial forlarge databases.

It is an object of this invention to provide a method and system thatfacilitates a search of a database based on fingerprints that exhibitvariance. It is a further object of this invention to provide a methodand system that facilitates efficient storage of a fingerprint databasein a limited-size memory.

These objects and others are achieved by a search that allows for arange of variance about each fingerprint, and by the use of a first-infirst-out storage strategy. Recognizing that a variety of differentfingerprints may correspond to the same dataset, the search of adatabase of fingerprints to find a match to a target fingerprint isperformed with a relaxed criteria for declaring a match between twofingerprints. By matching “similar”, but not “exact”, fingerprints,redundant fingerprints need not be stored for each dataset.

When a new fingerprint is found, a first-in first-out (FIFO) strategy isused to allocate space in a limited memory-space to store the new entry.

FIG. 1 illustrates an example block diagram of a search and storagesystem in accordance with this invention.

FIG. 2 illustrates an example flow diagram of a match-determiningprocess in accordance with this invention.

Throughout the drawings, the same reference numeral refers to the sameelement, or an element that performs substantially the same function.

FIG. 1 illustrates an example block diagram of a search and storagesystem 100 in accordance with this invention. The system 100 includes acomparator 150 that is configured to compare a target fingerprint toselect fingerprints from a database of fingerprints 140. An extractor110 extracts the target fingerprint from a media 101, and a sequencer120 selectively provides fingerprints from the database 140 forcomparison with this target fingerprint.

In accordance with this invention, the comparator 150 is configured todetermine a match between the target fingerprint and the databasefingerprint based on the amount of difference between the fingerprints,and not merely whether a difference exists. That is, the comparator 150is configured to declare a match between the target fingerprint and thedatabase fingerprint even if some differences exist between them. In thegeneral case, the comparator 150 includes a difference determinator 160that identifies the differences between the fingerprints, and aquantifier 170 that determines a measure of the amount of difference,based on the identified differences.

In the example embodiment illustrated in FIG. 1, the differencedeterminator 160 comprises an exclusive-OR (XOR) device that identifieseach differing bit of the signatures, and the quantifier 170 comprises alookup table (LUT) that maps the bit differences to the quantitativemeasure. The difference determinator 160 and quantifier 170 may beconfigured to effect a comparison of entire fingerprints, or, they maybe configured to sequentially effect comparisons of portions of thefingerprint, and accumulate a running sum of the difference measures.For example, the XOR device of the difference determinator 160 may beconfigured to compare each byte of the fingerprints to produce adifference-byte, and the lookup table of the quantifier 170 provides acount of the number of bit differences corresponding to eachdifference-byte. For example each of the difference-bytes 00000001,00000010, 00000100, . . . 10000000 will map to a quantity value of “1”,indicating a one-bit difference. Difference bytes 00000011, 00000101,00000110, . . . 10100000, 11000000 will map to a quantity value of “2”,indicating a two-bit difference, and so on. In such an embodiment, thequantifier 170 maintains a running sum of the quantity values from alookup table for each difference-byte, to provide a cumulative measureof the amount of difference between the fingerprints, which in thisexample, is a count of the total number of bits that differ between thefingerprints.

Other methods of measuring or quantifying the amount of differencebetween two fingerprints will be evident to one of ordinary skill in theart in view of this disclosure. For example, if particular words withinthe fingerprint are more important or distinctive than other words inthe fingerprint, the quantifier 170 may be configured to assigndifferent weight to the quantitative measure that is determined for eachword. In like manner, more differences may be allowable within somesegments of the fingerprint than in other segments, and so on.

A comparator device 180 compares the quantitative measure of thedifferences from the quantifier 170 to a threshold value Th to determinewhether a non-match is detected. If the measure of differences exceedsthe threshold, a non-match is declared. As contrast to conventionaldevices, the threshold value of this invention is greater than zero,thereby allowing one or more differences to exist between thefingerprints without declaring a non-match. If the comparator 150 isconfigured to sequentially compare bytes or words, or othersegmentations of the fingerprint, and the quantifier 170 provides arunning total of the measure of differences, a non-match may be declaredas soon as the running total exceeds the maximum.

The sequencer 120 is configured to control a memory controller 130 thatextracts each fingerprint from the database 140 for comparison with thetarget fingerprint. The term database is used herein in the generalsense, to include any collection of information that facilitatesretrieval of the information. The database may be stored in one or morememory devices, which may be configured internal or external to thesystem 100, or both. In a straightforward embodiment, the sequencer 120merely provides each fingerprint from the database 140 in a sequentialmanner, until a match is found by the comparator 150. In a more complexembodiment, the choice of each next fingerprint from the database 140may be based on results provided by the comparator 150. For example, ifthe fingerprints are stored in the database 140 in some order orpattern, the comparator 150 may be configured to provide an indicationof the differences between the last fingerprint from the database andthe target fingerprint. In such an embodiment, the sequencer may beconfigured to sequentially search using a particular increment span thatis dependent upon the indicated differences. For example, if substantialdifferences are noted, the sequencer may use a large increment spanuntil fewer differences are noted.

Copending U.S. Patent Applicantion, “REORDERED SEARCH OF MEDIAFINGERPRINTS”, filed Dec. 19, 2002, for Michael Epstein and RaymondKrasinski, Attorney Docket US020591 (702895), discloses advantages thatcan be gained by storing fingerprints in a database using a re-orderingof bytes, compared to the conventional MSB-to-LSB byte-ordering, and isincorporated by reference herein. If the fingerprints are stored in asorted order, either conventionally or as taught in this copendingapplication, the sequencer 120 is configured to effect an ordered searchof the database for the target fingerprint (as indicated by the dashedarrow between the fingerprint extractor 110 and the sequencer 120),using conventional sort-search techniques, such as a binary search basedon the sign of the difference between the prior fingerprint from thedatabase 140 and the target fingerprint. Because the comparator 150allows differences to exist while still declaring a match between twofingerprints, a sorted search by the sequencer 120 is modified comparedto a conventional sorted search. If a match is found, the sequencer 120terminates further searching, as in a conventional sorted search.However, if a match is not found among the samples that the sequencer120 selects based on the particular sorted-search algorithm that isused, an exhaustive search of the database 140 may be required to assurethat a near-miss fingerprint (i.e. a fingerprint that differs from thetarget fingerprint by less than the threshold amount) does not exist inthe database 140.

Optionally, when it is determined that a match cannot be found in thedatabase 140, the sequencer 120 is configured to store the fingerprint,and ancillary data, in the database 140, via the memory controller 130.In a preferred embodiment of this invention, the controller 130 isconfigured to effect a first-in first-out strategy for adding newfingerprints, in the event that the database 140 is full. Othertechniques for determining which information to remove to make room fornew information will be evident to one skilled in the art, includingprompting the user to manually delete a fingerprint to make room for thenew fingerprint.

FIG. 2 illustrates an example flow diagram of a match-determiningprocess in accordance with this invention. At 210, the targetfingerprint is received, and the loop 220-250 commences. At 220 afingerprint is selected from the database, and at 230, this fingerprintis compared to the target fingerprint. As noted above, this inventionallows a match to be determined between two fingerprints even ifdifferences exist between the fingerprints. In this example embodiment,the quantitative measure that is used to evaluate the differencesbetween signatures is the number of differences observed, such as thenumber of bits that differ between the signatures, or the number ofwords that differ between the signatures, and so on.

If, at 240, the number of differences between the signatures is greaterthan a threshold value, a non-match is asserted, and another signatureis selected from the database, at 220, except if all of the entries inthe database have been determined to not match, at 250. If all of theentries are determined to not-match, at 250, the process terminates at260, optionally by allowing the user to store the new informationcorresponding to the target fingerprint to the database.

If, at 240, the number of differences between the signatures is notgreater than the threshold, a match is declared, and the ancillaryinformation corresponding to the matching signature is retrieved, at270.

Note, however, that because a ‘near-miss’ may be identified as a matchto the target fingerprint, the near-miss may not, in fact, correspond tothe target. Not illustrated, if the retrieved information does notactually correspond to the target material (101 in FIG. 1), the user isprovided the option to store the new information corresponding to thetarget fingerprint to the database as an addition or a replacement.

The foregoing merely illustrates the principles of the invention. Itwill thus be appreciated that those skilled in the art will be able todevise various arrangements which, although not explicitly described orshown herein, embody the principles of the invention and are thus withinits spirit and scope. For example, the aforementioned threshold value ispresented herein as a static value. One of ordinary skill in the artwill recognize that ‘learning’ techniques can be applied to the system100 to dynamically modify the threshold value to improve the performanceof the system. For example, the threshold can be modified based on theobserved variances among signatures for the same material. If the userrepeatedly identifies a non-correspondence between matched-fingerprintsand targets, as discussed in the immediately prior paragraph, forexample, the system 100 could be configured to reduce the thresholdvalue, either automatically, or with the user's approval or initiation.In like manner, the threshold value may be dynamically modified based onthe size of the database 140, or a classification of the contents of thedatabase 140. In like manner, if the fingerprints are classified orordered, different threshold values may be used for differentclassifications or orders. These and other system configuration andoptimization features will be evident to one of ordinary skill in theart in view of this disclosure, and are included within the scope of thefollowing claims.

1. A system for searching a plurality of fingerprints for a selectfingerprint that corresponds to a target fingerprint, comprising: acomparator that is configured to compare a given fingerprint to thetarget fingerprint, and to identify the given fingerprint as the selectfingerprint when a match is determined, and, a sequencer that providesthe given fingerprint from the plurality of fingerprints to thecomparator, wherein the comparator is configured to determine the matchbased on a quantitative measure associated with differences between thegiven fingerprint and the target fingerprint, such that the match can bedetermined when one or more differences exist between the givenfingerprint and the target fingerprint.
 2. The system of claim 1,wherein the quantitative measure is dependent upon a count of thedifferences between the given fingerprint and the target fingerprint. 3.The system of claim 1, wherein the comparator is configured to determinethe match by comparing the quantitative measure to a threshold value. 4.The system of claim 3, wherein the system is further configured todynamically adjust the threshold value based on prior determinations ofmatches.
 5. The system of claim 1, wherein the comparator includes adifference determinator that is configured to identify the differencesbetween the given fingerprint and the target fingerprint; and aquantifier, operably coupled to the difference determinator, that isconfigured to determine the quantitative measure based on the identifieddifferences.
 6. The system of claim 5, wherein the differencedeterminator includes an exclusive-or function.
 7. The system of claim6, wherein the quantifier includes a lookup table that provides aquantity value based on the identified differences, and the quantifierdetermines the quantitative measure based on the quantity value.
 8. Thesystem of claim 5, wherein the quantifier includes a lookup table thatprovides a quantity value based on the identified differences, and thequantifier determines the quantitative measure based on the quantityvalue.
 9. The system of claim 1, further including a memory controllerthat is configured to store the target fingerprint as one of theplurality of fingerprints when the match is not determined.
 10. Thesystem of claim 9, wherein the memory controller is configured to use afirst-in first-out strategy to store the target fingerprint in a memory.11. A system for searching a plurality of fingerprints for a selectfingerprint that corresponds to a target fingerprint, comprising: acomparator that is configured to compare a given fingerprint to thetarget fingerprint, and to identify the given fingerprint as the selectfingerprint when a match is determined, a sequencer that provides thegiven fingerprint from the plurality of fingerprints to the comparator,a memory that is configured to contain the plurality of fingerprints,and a memory controller that is configured to store the targetfingerprint as one of the plurality of fingerprints in the memory whenthe match is not determined, using a first-in first-out (FIFO) strategy.12. The system of claim 11, wherein the plurality of fingerprints arestored in the memory in a sorted order.
 13. The system of claim 12,wherein the comparator is configured to determine the match when anumber of differences between the given fingerprint and the targetfingerprint is less than a threshold value that is greater than one,thereby allowing the match to be determined when one or more differencesexist between the given fingerprint and the target fingerprint.
 14. Amethod of searching a plurality of fingerprints for a matchingfingerprint that corresponds to a target fingerprint, comprising:selectively comparing a given fingerprint from the plurality offingerprints to the target fingerprint to determine whether the givenfingerprint is the matching fingerprint, wherein the given fingerprintis determined to be the matching fingerprint when a number ofdifferences between the given fingerprint and the target fingerprint isless than a threshold value that is greater than one, thereby allowingthe given fingerprint to be determined to be the matching fingerprintwhen one or more differences exist between the given fingerprint and thetarget fingerprint.
 15. The method of claim 14, wherein comparing thegiven fingerprint to the target fingerprint includes: identifyingdifferences between the given fingerprint and the target fingerprint,and quantifying the number of difference based on the identifieddifferences.
 16. The method of claim 15, wherein identifying thedifferences includes effecting an exclusive-or of the given fingerprintand the target fingerprint.
 17. The method of claim 16, whereinquantifying the number of differences includes accessing a lookup tableto obtain a quantity value based on the identified differences.
 18. Themethod of claim 17, wherein quantifying the number of differencesincludes accessing a lookup table to obtain a quantity value based onthe identified differences.
 19. The method of claim 14, furtherincluding storing the target fingerprint as one of the plurality offingerprints when the matching fingerprint is not found in the pluralityof fingerprints.
 20. The method of claim 19, wherein storing the targetfingerprint includes applying a first-in first-out strategy to store thetarget fingerprint in a limited-size memory.